Narrowband to wideband feature expansion for robust multilingual ASR
نویسنده
چکیده
To build high quality wideband acoustic models for automatic speech recognition (ASR), a large amount of wideband speech training data is required. However, for a particular language, one may have available a lot of narrowband data, but only a limited amount of wideband data. This paper deals with such situation and proposes a narrowband to wideband expansion algorithm that expands the narrowband signal ASR features to wideband ASR features. The algorithm is tested in two practical situations comprising sufficient amount and insufficient amount of original wideband training data. Tests show that using a combination of wideband features and expanded features does not harm the ASR performance when having a sufficient amount of the original wideband data, and it improves the ASR performance significantly when only a limited amount of wideband data is originally available. In the presented multilingual tests, a unique expansion model is trained for four languages from the Speecon database. Availability of different amounts of wideband training data is considered, including the case when no wideband data is available. ASR experiments for each language confirm that the addition of expanded features to the wideband model training enhances the models and provides better results than using the limited amount of wideband data only. In all tests, the ETSI standard noise-robust front-end is used.
منابع مشابه
DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech
We propose a number of enhancement techniques to improve speech quality in bandwidth expansion (BWE) from narrowband to wideband speech, addressing three issues, which could be critical in real-world applications, namely: (1) discontinuity between narrowband spectrum and the estimated high frequency spectrum, (2) energy mismatch between testing and training utterances, and (3) expanding bandwid...
متن کاملWindowing Effects of Short Time Fourier Transform on Wideband Array Signal Processing Using Maximum Likelihood Estimation
During the last two decades, Maximum Likelihood estimation (ML) has been used to determine Direction Of Arrival (DOA) and signals propagated by the sources, using narrowband array signals. The algorithm fails in the case of wideband signals. As an attempt by the present study to overcome the problem, the array outputs are transformed into narrowband frequency bins, using short time Fourier tran...
متن کاملWindowing Effects of Short Time Fourier Transform on Wideband Array Signal Processing Using Maximum Likelihood Estimation
During the last two decades, Maximum Likelihood estimation (ML) has been used to determine Direction Of Arrival (DOA) and signals propagated by the sources, using narrowband array signals. The algorithm fails in the case of wideband signals. As an attempt by the present study to overcome the problem, the array outputs are transformed into narrowband frequency bins, using short time Fourier tran...
متن کاملRobust Bandwidth Extension of Noise-co
We present a new bandwidth extension algorithm for converting narrowband telephone speech into wideband speech using a transformation in the mel cepstral domain. Unlike previous approaches, the proposed method is designed specifically for bandwidth extension of narrowband speech that has been corrupted by environmental noise. We show that by exploiting previous research in mel cepstrum feature ...
متن کاملWTIMIT: The TIMIT Speech Corpus Transmitted Over The 3G AMR Wideband Mobile Network
In anticipation of upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It opens up various scientific investigations; e.g., on speech quality and intelligibility, as well as on wideband upgrades of network-side interactive voice response (IVR) systems with retrained or bandwidth-extended...
متن کامل